Approach

Uncontrained Optimization

  1. Use a combination of prediction and interpolation to predict all values oftime and accuracy under different hardware/software configurations.
  2. Find the minimal combination of algorithm inputs that maximize accuracy. If there are ties, break them by using the point that requires the least data.
  3. Find the costs associated with running the algorithm with those inputs on all different hardware configurations.
  4. Find the combination of hardware that jointly minimizes cost and time.

Data-contrained Optimization

  1. Use a combination of prediction and interpolation to predict all values oftime and accuracy under different hardware/software configurations.
  2. Subset the accuracy surface produced above to the amount of data available. The maximizing point will fall in the upper right corner of the subsetted space.
  3. Find the costs associated with running the algorithms with the accuracy-maximizing point on all different hardwares.
  4. Find the combination of hardware that jointly minimizes time and cost.

Cost-constrained Optimization

  1. Use a combination of prediction and interpolation to predict all values oftime and accuracy under different hardware/software configurations.
  2. Subset the space produced above to the amount of time and money able to be spent on modelling.
  3. Working backwards now, find the accuracies that can be produced in the limited time.
  4. Using the subset of accuracy space, find the combination of algorithm inputs that maximizes accuracy.
knitr::opts_chunk$set(cache=TRUE, echo=F, warning=F, error = F, message=F)
knitr::opts_knit$set(root.dir = "/users/scottsfarley/documents")
setwd("/users/scottsfarley/documents")
library(parallel)
library(doParallel)
## Loading required package: foreach
## Loading required package: iterators
library(akima)
library(ggplot2)
options(java.parameters = "-Xmx1500m")
library(bartMachine)
## Loading required package: rJava
## Loading required package: bartMachineJARs
## Loading required package: car
## Warning: replacing previous import 'lme4::sigma' by 'stats::sigma' when
## loading 'pbkrtest'
## Loading required package: randomForest
## randomForest 4.6-12
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
## 
##     margin
## Loading required package: missForest
## Loading required package: itertools
## Welcome to bartMachine v1.2.3! You have 1.4GB memory available.
## 
## If you run out of memory, restart R, and use e.g.
## 'options(java.parameters = "-Xmx5g")' for 5GB of RAM before you call
## 'library(bartMachine)'.
bartMachine::set_bart_machine_num_cores(3)
## bartMachine now using 3 cores.
library(reshape2)
library(ggdendro)

threshold.time <- 22 ##seconds
threshold.cost <- 30 ##cents
threshold.numTex <- 45

First, get the training data and fit the model. Perform some skill checks on it.

## bartMachine initializing with 50 trees...
## bartMachine vars checked...
## bartMachine java init...
## bartMachine factors created...
## bartMachine before preprocess...
## bartMachine after preprocess... 6 total features...
## bartMachine sigsq estimated...
## bartMachine training data finalized...
## Now building bartMachine for regression ...
## serializing in order to be saved for future R sessions...done
## bartMachine initializing with 50 trees...
## bartMachine vars checked...
## bartMachine java init...
## bartMachine factors created...
## bartMachine before preprocess...
## bartMachine after preprocess... 6 total features...
## bartMachine sigsq estimated...
## bartMachine training data finalized...
## Now building bartMachine for regression ...
## serializing in order to be saved for future R sessions...done

Choose a finite number of possible solutions to the model. Ideally, we would want every single combination of predictor variables [0, Inf]. This is obviously intractable. Moreover, I only have data for a subset of that space anyways. So randomly sample the subspace in which I have data to make the problem possible to solve.

Using that subset of data and the models we fit previously, predict each candidate configuration of algorithm inputs and hardware variables for execution time and SDM accuracy.

Plot the posterior means of the accuracy models against the algorithm inputs that should control accuracy. In this case, these are number of training examples and number of covariates.

The accuracy clearly varies from low (few training examples and few covariates) to very high (many covariates, many training examples). Perhaps more data would be helpful here, but what are you going to do. Our task is to find the combinations of inputs that results in the highest accuracy model. If there’s a tie, find the combination that needs the least data.

Now, we know the combination of algorithm inputs that result in the highest accuracy. The figure below shows the combination identified on the training examples and covariates axes. This combination of training examples and number of covariates can be run on any combination of hardware. Some might be suboptimal. Thus, at this point, we’ve solved half of our challenge: algorithm inputs have been optimized, now it’s time optimize hardware.

## [1] "Accuracy is maximized at 1000 training examples and 5 predictors."

In theory, the hardware parameters should not affect the SDM accuracy. We can test this assumption here, by plotting the accuracies obtained for this combination of algorithm inputs against modeled accuracy on the number of CPUs and amount of memory. If the assumption is valid, the plot should show no change in either the horizontal or vertical directions. We see that there is, in fact, some change, though. This is likely due to expeirmental design, and lack of a full factorial design setup. The effect is realtively minor, and I choose to comment it and move along.

## [1] "Accuracy Range on Hardware:  0.0445015199295271"
## [1] "Accuracy Range from Expectation:  0"
## [1] "------"
## [1] "Fixing accuracy at:  0.772216784759339"

Now, fix the algorithm inputs at the accuracy-maximizing point– effectively fixing expected model accuracy. An algorithm with these inputs can be run on any combination of hardware. Project how long that specific model would take and how much it would cost on all computing types. Plot those out on time vs. cost axes.

The optimal solution is the one that balances time and cost equally during the minimization. We use euclidean distance here, which normalizes each dimension by its standard deviation, so they are weighted equally. For each candidate combiantion of hardware, we calculate the distance between it and the origin of these two axes. We then find the minimum of that distance matrix and call that point the optimal.

Our job is complete. We’ve now optimized both the harware and software dimensions of the problem.

## [1] "------MARS OPTIMAL--------"
## [1] "Predicted Optimal Accuracy 0.772216784759339 +/- 0"
## [1] "Predicted Optimal Cost (seconds) 9.11137224465298"
## [1] "Predicted Optimal Cost (cents) 10.8642180370793"
## [1] "Cores:  7"
## [1] "Memory: 16"
## [1] "Training Examples: 1000"
## [1] "Covariates: 5"

Everything up to this point was done using the mean of the posterior distribution, a choice which simplifies the process but causes some information loss and may cause over-confidence in the predictions. We can modify our steps to include information from the entire posterior, which may solve this issue.

Instead of projecting just the mean time and mean cost for use the the distance minimization, use the entire set of posterior samples. Calculate the distance metric for each sample in the posterior independently. You’re then left with a density distribution of distances, from which we can infer the minimum value.

The posteriors are in a line, since there’s a fixed linear relationship between time and cost.

Now, find the distance metrics for all of those points.

There’s a lot of overlab in this figure, and many points are far away from the optimal. We don’t care about those. Take the few closest to the minimum and look at their distributions.

Now, the optimal configuration may be one of the following:

config cores GBMemory seconds cost distance.mean distance.sd
81 7 16 9.111372 10.864218 14.28928 1.688315
93 8 16 9.111372 12.416249 15.52028 1.833760
105 9 16 9.111372 13.968280 16.80676 1.985762
117 10 16 9.111372 15.520312 18.13693 2.142924
129 11 16 9.111372 17.072343 19.50185 2.304192
141 12 16 9.111372 18.624374 20.89470 2.468761
153 13 16 9.111372 20.176405 22.31026 2.636013
165 14 16 9.111372 21.728436 23.74446 2.805468
177 15 16 9.111372 23.280467 25.19413 2.976750
21 2 16 17.802669 6.065013 25.45145 20.409807
189 16 16 9.111372 24.832498 26.65673 3.149560
33 3 16 18.159782 9.280012 27.80320 22.442834
200 17 14 94.243809 240.802355 28.13026 3.323661
212 18 14 94.243809 254.967200 29.61307 3.498860
45 4 16 18.159782 12.373349 29.95855 24.182636
57 5 16 17.214917 14.661945 31.00781 25.254975
224 19 14 94.243809 269.132044 31.10385 3.674999
236 20 14 94.243809 283.296889 32.60150 3.851950
69 6 16 17.214917 17.594333 33.75418 27.491819
248 21 14 94.243809 297.461733 34.10512 4.029606
260 22 14 94.243809 311.626577 35.61394 4.207877
272 23 14 94.243809 325.791422 37.12734 4.386689
284 24 14 94.243809 339.956266 38.64477 4.565977
9 1 16 26.358748 4.489949 43.84453 41.235970
73 7 2 123.674542 26.023597 135.34560 44.687413

config cores GBMemory seconds cost distance.mean distance.sd cluster
81 7 16 9.111372 10.86422 14.28928 1.688315 1
93 8 16 9.111372 12.41625 15.52028 1.833760 1
105 9 16 9.111372 13.96828 16.80676 1.985762 1
117 10 16 9.111372 15.52031 18.13693 2.142924 1
129 11 16 9.111372 17.07234 19.50185 2.304192 1
141 12 16 9.111372 18.62437 20.89470 2.468761 1
153 13 16 9.111372 20.17640 22.31026 2.636013 1
165 14 16 9.111372 21.72844 23.74446 2.805468 1
177 15 16 9.111372 23.28047 25.19413 2.976750 1
189 16 16 9.111372 24.83250 26.65673 3.149560 1

In the results above, you’re accutally seeing the trade off between time and money play out quite nicely. Adding cores costs money, but, in the case of MARSs, reduces time. Here, that tradeoff basically exactly evens out.

Cost-Constrained Optimization

There are two main types of constraints on this optimization problem: (1) limited time and/or money and (2) limited data. In the first case, the researcher only has so much time or money (or both) available to be spent on modelling. She must optimize her workflow so that she can get the most out of the limited funds she has available to her. For one experiment, this doesn’t really hold water, since there are only cents and seconds being spent on computing the models. However, when think about global syntheses with many species, these add up to be significant expenditures.

In this simple example, the researcher has a hard maximum of 10 seconds and 25 cents to be spent on the model.

First, calculate a surface of all the possible configurations, whether or not they meet her threshold.

Now, subset that surface, retaining only the configurations that satisfy her threshold in time and in money. Find the maximum amount of data that can be used, and calculate all the possible accuracies that could be achieved using it. This down-weights expensive computing types, and encourages the solution to have a high accuracy.

## [1] "There are  1251 candidates."

Notice the change is scales. We’re not able to get to a point with more than 4000 training examples now. Instead, we’ve limited to low data, and lower accuracy, because of the time/money constraint.

## [1] "Accuracy is maximized at 130 training examples and 5 predictors."

## [1] "Expected accuracy in this scenario is:  0.701559387234422"
## [1] "Fixing training examples at:  130"
## [1] "Fixing covariates at:  5"

Finally, come back around, and find the computing hardware that’s best for these inputs.

## [1] "Now there are only:  287 candidates, instead of  287 candidates that can complete this scenario under budget."

## [1] "Recommended # cores:  7"
## [1] "Recommended Memory:  16"
## [1] "Expected Cost:  6.89095292443005"
## [1] "Expected Seconds:  5.7791584263658"

config cores GBMemory seconds cost distance.mean distance.sd
81 7 16 5.779158 6.890953 9.591231 3.492383
93 8 16 5.779158 7.875375 10.417498 3.793245
105 9 16 5.779158 8.859797 11.281010 4.107669
117 10 16 5.779158 9.844219 12.173842 4.432769
129 11 16 5.779158 10.828640 13.089997 4.766362
21 2 16 10.684662 3.640051 13.163162 7.060324
141 12 16 5.779158 11.813062 14.024905 5.106783
33 3 16 10.900275 5.570258 14.353940 7.783501
153 13 16 5.779158 12.797484 14.975054 5.452754
45 4 16 10.900275 7.427011 15.466679 8.386889
165 14 16 5.779158 13.781906 15.937718 5.803281
57 5 16 10.333547 8.801082 15.960514 8.776853
177 15 16 5.779158 14.766328 16.910761 6.157588
69 6 16 10.333547 10.561298 17.374144 9.554222
189 16 16 5.779158 15.750749 17.892488 6.515057
200 17 14 39.430990 100.750122 18.881546 6.875195
212 18 14 39.430990 106.676600 19.876840 7.237604
224 19 14 39.430990 112.603078 20.877479 7.601959
236 20 14 39.430990 118.529556 21.882728 7.967993
9 1 16 15.819972 2.694774 22.507541 15.417707
248 21 14 39.430990 124.456033 22.891981 8.335484
260 22 14 39.430990 130.382511 23.904731 8.704249
272 23 14 39.430990 136.308989 24.920551 9.074132
284 24 14 39.430990 142.235467 25.939081 9.445002
73 7 2 51.664022 10.871144 56.449660 18.427675

config cores GBMemory seconds cost distance.mean distance.sd cluster
81 7 16 5.779158 6.890953 9.591231 3.492383 1
93 8 16 5.779158 7.875375 10.417498 3.793245 1
105 9 16 5.779158 8.859797 11.281010 4.107669 1
117 10 16 5.779158 9.844219 12.173842 4.432769 1
129 11 16 5.779158 10.828640 13.089997 4.766362 1
141 12 16 5.779158 11.813062 14.024905 5.106783 1
153 13 16 5.779158 12.797484 14.975054 5.452754 1
165 14 16 5.779158 13.781906 15.937718 5.803281 1
177 15 16 5.779158 14.766328 16.910761 6.157588 1
189 16 16 5.779158 15.750749 17.892488 6.515057 1

Data Constraint

In this case, we’ve got a constraint on the amount of data available to us.

## [1] "Current data threshold is  45"
## [1] "Accuracy is maximized at 44 training examples and 5 predictors."
## [1] "Expected Max Accuracy is  0.694574862881384"
## [1] "Now there are only:  287 candidates, instead of  287 candidates that can complete this scenario under budget."

## [1] "Recommended # cores:  7"
## [1] "Recommended Memory:  16"
## [1] "Expected Cost:  0.698721408620309"
## [1] "Expected Seconds:  0.585988869840411"

Finally, come back around, and find the computing hardware that’s best for these inputs.

## [1] "Now there are only:  287 candidates, instead of  287 candidates that can complete this scenario under budget."

## [1] "Recommended # cores:  7"
## [1] "Recommended Memory:  16"
## [1] "Expected Cost:  2.14242888761311"
## [1] "Expected Seconds:  1.79676687600691"

config cores GBMemory seconds cost distance.mean distance.sd
81 7 16 1.796767 2.1424289 3.580441 2.695402
93 8 16 1.796767 2.4484902 3.888889 2.927606
21 2 16 3.274177 1.1154465 4.152403 2.109742
105 9 16 1.796767 2.7545514 4.211241 3.170276
117 10 16 1.796767 3.0606127 4.544539 3.421187
33 3 16 3.380755 1.7276335 4.577421 2.314046
129 11 16 1.796767 3.3666740 4.886543 3.678652
45 4 16 3.380755 2.3035113 4.932269 2.493434
57 5 16 3.211710 2.7354130 5.080262 2.563458
141 12 16 1.796767 3.6727352 5.235547 3.941387
69 6 16 3.211710 3.2824957 5.530223 2.790505
153 13 16 1.796767 3.9787965 5.590241 4.208405
165 14 16 1.796767 4.2848578 5.949607 4.478940
177 15 16 1.796767 4.5909190 6.312847 4.752392
189 16 16 1.796767 4.8969803 6.679330 5.028285
200 17 14 9.751380 24.9157520 7.048549 5.306237
9 1 16 4.808306 0.8190469 7.102528 4.399200
212 18 14 9.751380 26.3813845 7.420095 5.585943
224 19 14 9.751380 27.8470170 7.793637 5.867150
236 20 14 9.751380 29.3126495 8.168900 6.149653
248 21 14 9.751380 30.7782819 8.545658 6.433281
260 22 14 9.751380 32.2439144 8.923721 6.717892
272 23 14 9.751380 33.7095469 9.302931 7.003365
284 24 14 9.751380 35.1751794 9.683151 7.289600
73 7 2 12.299971 2.5881600 13.450807 4.425395

config cores GBMemory seconds cost distance.mean distance.sd cluster
81 7 16 1.796767 2.142429 3.580441 2.695402 1
93 8 16 1.796767 2.448490 3.888889 2.927606 1
21 2 16 3.274177 1.115447 4.152403 2.109742 1
105 9 16 1.796767 2.754551 4.211241 3.170276 1
117 10 16 1.796767 3.060613 4.544539 3.421187 1
33 3 16 3.380755 1.727634 4.577421 2.314046 1
45 4 16 3.380755 2.303511 4.932269 2.493434 1
57 5 16 3.211710 2.735413 5.080262 2.563458 1
69 6 16 3.211710 3.282496 5.530223 2.790505 1